AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Multimodal image-text understanding

# Multimodal image-text understanding

Qwen2.5 VL 3B Instruct GGUF
Qwen2.5-VL-3B-Instruct is a 3B-parameter multimodal model supporting image-text generation tasks, specifically optimized for vision capabilities in llama.cpp.
Text-to-Image English
Q
Mungert
10.44k
8
Gme Qwen2 VL 2B Instruct GGUF
This is a quantized version of a multimodal model that supports both English and Chinese, suitable for image-text to text tasks.
Image-to-Text Supports Multiple Languages
G
sinequa
350
0
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase